Tracking unbounded Topic Streams
نویسندگان
چکیده
Tracking topics on social media streams is non-trivial as the number of topics mentioned grows without bound. This complexity is compounded when we want to track such topics against other fast moving streams. We go beyond traditional small scale topic tracking and consider a stream of topics against another document stream. We introduce two tracking approaches which are fully applicable to true streaming environments. When tracking 4.4 million topics against 52 million documents in constant time and space, we demonstrate that counter to expectations, simple single-pass clustering can outperform locality sensitive hashing for nearest neighbour search on streams.
منابع مشابه
The Stor-e-Motion Visualization for Topic Evolution Tracking in Social Media Streams
Nowadays, there are plenty of sources generating massive amounts of text streams in a continuous way. For example, the increasing popularity and the active use of social networks results in voluminous and fast-flowing data streams containing a large amount of user-generated data about almost any topic around the world. However, the observation and tracking of the ongoing evolution of topics in ...
متن کاملTopic Detection & Tracking (TDT) Overview & Perspective
Topic Detection and Tracking (TDT) refers to automatic techniques for finding topically related material in streams of data (e.g., newswire and broadcast news). Work on TDT began about a year ago, is now expanding, and will be a regular feature at future Broadcast News workshops.
متن کاملThe Stor-e-Motion Visualization for Topic Evolution Tracking in Text Data Streams
Nowadays, there are plenty of sources generating massive amounts of text data streams in a continuous way. For example, the increasing popularity and the active use of social networks result in voluminous and fastflowing text data streams containing a large amount of user-generated data about almost any topic around the world. However, the observation and tracking of the ongoing evolution of to...
متن کاملTowards Online Concept Drift Detection with Feature Selection for Data Stream Classification
Data Streams are unbounded, sequential data instances that are generated very rapidly. The storage, querying and mining of such rapid flows of data is computationally very challenging. Data Stream Mining (DSM) is concerned with the mining of such data streams in real-time using techniques that require only one pass through the data. DSM techniques need to be adaptive to reflect changes of the p...
متن کاملEmerging User Intentions: Matching User Queries with Topic Evolution in News Text Streams
Topic and event evolution analysis aiming at trend detection and tracking (TDT) from news data streams has considerably gained in interest during the last years. Consolidated studies have concentrated on identifying and visualizing dynamically evolving text patterns from news data streams. Detecting and understanding user behavior and relating user intentions to emerging topic trends in news da...
متن کامل